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Abstract 

We propose a Bayesian model of iterative learning on social networks that is computationally 
tractable; the agents of this model are fully rational, and their calculations can be performed with 
modest computational resources for large networks. Furthermore, learning is efficient, in the 
sense that the process results in an information-theoretically optimal belief. This result extends 
Condorcet's Jury Theorem to general social networks, preserving rationality, computational 
feasibility and efficient learning. 

The model consists of a group of agents who belong to a social network, so that a pair of 
agents can observe each other's actions only if they are neighbors. We assume that the network 
is connected and that the agents have full knowledge of the structure of the network, so that 
they know the members of the network and their social connections. 

The agents try to estimate some state of the world S (say, the price of oil a year from 
today). Each agent has a private measurement: an independently acquired piece of information 
regarding S. This is modeled, for agent v, by a number S v picked from a Gaussian distribution 
with mean S and standard deviation one. Accordingly, agent u's prior belief regarding S is a 
normal distribution with mean S v and standard deviation one. 

The agents start acting iteratively. At each iteration, each agent takes the optimal action 
given its current belief. This action reveals its mean estimate of S to its neighbors. Then, 
observing its neighbors' actions, each agent updates its belief, using Bayes' Law. 

We show that this process is efficient: all the agents converge to the belief that they would 
have, had they access to all the private measurements. Additionally, and in contrast to other 
iterative Bayesian models on networks, it is computationally efficient, so that each agent's 
calculation can be easily carried out. 

1 Introduction 



1.1 Background 

The basic premise of mathematical behavioral economics is that human beings are rational. A 
natural model of rationality, when faced with uncertainty, is Bayesian inference together with a 
utility function that depends on the state of the world and the actions of the individuals. This 
model, from its first days, ran into the difficulties of computational intractability [11] . 

An obvious source of difficulty is the complexity of the world; it is infeasible to even represent a 
probability distribution over the possible states of our world. But even when the state of the world 

*Weizmann Institute and U.C. Berkeley. E-mail: mossel@stat.berkeley.edu. Supported by a Sloan fellowship in 
Mathematics, by BSF grant 2004105, by NSF Career Award (DMS 054829) by ONR award N00014-07- 1-0506 and 
by ISF grant 1300/08 

^Weizmann Institute. Supported by ISF grant 1300/08 



1 



is taken to be very simple - binary for example - the computational challenge an individual faces 
may still prove insurmountable, when social networks are taken into account [8]. 

Research in this field has been characterized by a tension between rationality and tractabil- 
ity: Rational models are in general intractable, and therefore unrealistic, whereas other (usually 
boundedly-rational) models are somewhat arbitrary. 

For some network geometries the problem is easy. For example, when the network graph is a 
clique, Condorcet's Jury Theorem [5], a founding work in this field, states that when each individual 
receives a weak, independent, binary signal on the state of the world, the group can aggregate their 
information using Majority Rule to recover the true state of the world with high probability, if 
the group is large enough. More specifically, the probability of correct recovery goes to one as the 
number of agents goes to infinity, a property of the model known as asymptotic learning. 

The history of Social Learning on Networks can be viewed as an attempt to find extensions of 
Condorcet's Jury Theorem, to setups where direct interaction is allowed only between some of the 
agents, so that the structure of social relationships is given by a connected network. An elusive 
goal has been finding a model that is rational, tractable and results in asymptotic learning, as 
Condorcet's does. 

First models, such as the De-Groot model [6], consider iterative network processes, where each 
node performs the computationally simple task of averaging its current distribution for the state 
of the world with that of its neighbors. This leads to convergence of all agents to the same value, 
which is the average of the original beliefs, as weighted by the degree. This model is elegant, but 
leaves more to be desired; the nature of the utility maximized by the actions of the different agents 
is unclear, and so it is hard to see how this model can be classified as rational. Additionally, 
convergence to a weighted average may be considered as suboptimal, as the true average of the 
signals is a better estimate of the original signal under standard statistical assumptions. This 
is most apparent in networks where some nodes have degree which is proportional to the total 
number of connections in the network. In such networks, no matter how large they are, with 
constant probability the estimates will converge to values which are bounded away from the true 
ones. 

A modern approach to the problem is in the Bayesian setup. Founding work in the Bayesian 
realm (e.g., [3], [I] and [12]) focused on chains of individuals, each of which, in turn and according 
to a set order, chooses an action based on private information and past actions. Similar models, 
in which agents have limited sets of social ties, have also been devised, e.g. pp. Of course these 
models are somewhat limited in their modeling power, as most realistic network interactions are 
iterative, and so agents learn and act over and over again. 

This problem is partially addressed in the work of Bala and Goyal [2] , where iterative interactions 
are allowed. They have shown that for some priors and some network structures, the agents converge 
to an optimal action. However, their model assumes that in each round each agent may receive 
an independent signal of the state of the world, and thus a potentially unbounded amount of 
information may be available to an individual. Furthermore, they bound the rationality of the 
agents for various reasons, one of which is that fully rational agents would have to carry out 
intractable computations. 

Gale and Kariv |8j propose a model in which each agent receives an independent signal at the 
beginning of the process, and thenceforth all agents act and observe simultaneously and repeatedly, 
improving their knowledge of the state of the world with every iteration. The model is completely 
Bayesian in the sense that the agents are Bayesian and their actions are aimed at maximizing the 
expected utility at each round. It requires the agents to know the structure of the network graph. 

This is a natural model for studying the paradigm of the Condorcet Jury Theorem, in a network 
setup. It is shown in [8] that all the beliefs of the agents converge, and that it could not be 
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that different agents will converge to different actions. However, their results do not rule out the 
possibility that the actions will not converge at all. Additionally, their model, in its full generality, 
is computationally intractable. 

1.2 Our Contribution 

We consider a model which is a variant of this model. We insist that the action space is infinite 
and indeed that the action of each agent reveals the mean of its belief to its neighbors. We further 
assume that the original signals are Gaussian. 

In this model the state of the world is S, a real number. The agents each have a private 
Gaussian measurement of S. Iteratively, they pick an action, and learn their neighbors' estimate 
for the expected value of S from their actions. Then they perform a fully Bayesian calculation 
which results in a new estimate of S. 

We prove that this model satisfies the strongest possible Condorect Jury Theorem: 

• The agents are fully rational and the agents' calculations are tractable. 

• For every connected network, all agents will converge to the same beliefs, and that this belief 
is optimal, i.e. the belief given by the average of the original signals. 

• Finally, convergence takes place in a finite number of rounds - in fact the number of round is 
at most twice the number of agents in the network times the diameter of the network. 

After the first draft of our paper was circulated |10j Marcus Mdbious (whom we would like to 
thank) brought to our attention that essentially the same model is briefly mentioned in a paper 
by DeMarzo, Vayanos and Zweibel [7], who prove that convergence takes place after a number of 
rounds equal to the square of the number of agents. It appears that the main reason that [7] did not 
devote more attention to the model was computational, as they write: 'We should emphasize that 
the calculations that agents must perform even in this simple case where the network is common 
knowledge can be very complicated.". In contrast, we show that the computations are efficient 
- both theoretically (each update involves linear algebra manipulations in dimension n) and in 
practice (we have efficient code that performs all the agents calculations for large networks). 

A second property of this model that DeMarzo et al. found unrealistic is the requirement that 
all agents know the structure of the social network. While indeed this may be difficult to justify for 
some large networks, it is perhaps not strictly necessary; in order to perform their calculations, the 
agents need to know the covariance between the estimators of their neighbors only. In our model, 
they derive this knowledge from the structure of the graph, but in principle it may be derived by 
other means. This observation (which was clarified in discussions with Rafael M. Frongillo, Grant 
Schoenebeck and Adam Kalai, whom we would like to thank) presents an opportunity for follow-up 
work involving some variant of this model, which does not require knowledge of the network, but 
still preserves rationality, tractability and efficient learning. More on this and other future research 
is presented at the conclusion section. 

2 The Model 

The model can be divided into three parts: the agents and their social network, the state of the 
world and its measurements, and the agents' behavior. 
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2.1 The Agents and their Social Network 

The agents in our model are the nodes of a fixed network of social ties. Formally: 

• The agents are a finite set V. 

• The set of social ties E is a set of pairs of agents, so that {u, v} = {v, u} is in E if agents u 
and v are neighbors. 

• Every pair of agents is connected by a chain of neighbors: even if u and v are not neighbors, 
then there exists a chain w±, . . . , Wk such u = wi, v = Wk, and w\ is a neighbor of W2, W2 is 
a neighbor of W3 etc. 

Hence G = (V, E) is a finite, simple, connected, undirected graph. Note that this graph does not 
change with time. 

2.2 The State of the World and its Measurements 

The agents reside in a world characterized by some number S. They have some information on S 
which is not certain. 

• Let S 6 R be some state of the world. 

• For each agent v, let S v be u's private measurement, so that S v is picked from the normal 
distribution with mean S and standard deviation a. 

• The different S v 's are independent. 

2.3 The Agents' Behavior 

The agents are Bayesian, so they initially have some prior belief regarding S, and update it to a 
posterior belief, according to Bayes' Law, with each additional piece of information they encounter. 
Both prior and posterior beliefs are distributions on the possible values of S. 

In our model each agent v has a different prior, which is the Gaussian distribution with mean 
S v and variance one. An equivalent model, in which all agents initially have identical priors, is 
the one in which the common ("improper") prior is the uniform measure on R. After learning the 
private measurement, each agent would update its belief and would at that point have the prior 
belief of the agents in our model. Roughly speaking, this improper prior is well approximated by 
the normal distribution with some extremely large variance. 

At each iteration, each agent picks an action A x , where x is some real number. We assume that 
when x is equal to the expectation of an agent's current belief, then A x is its optimal action. This 
can be achieved by, for example, setting the agents' utility to be U(A X ) = — (S — x) 2 and assuming 
that the agents want to maximize their expected utility. We also assume that one can learn x by 
observing that an agent has picked A x . That is A x is different from A y when x is different than y. 

Having carried out some action, each agent observes its neighbors' actions, and calculates a new 
posterior distribution, based on all the information it has come across so far. 

Formally: 

• Agent -y's prior belief regarding S is the Gaussian distribution with mean S v and standard 
deviation one. Denote this belief .B^o- 

• At time t G N, agent v takes action A x n\, where x v (t) = E[S\B V:t \. 
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• Next, it observes its neighbors actions, and learns x u (t) for each of its neighbors u. It then 
calculates, using Bayes' Law, a new posterior distribution B V) t+\, based on all it has observed 
so far - its neighbors' actions in the previous iterations. 

Note that the fact that each agent knows the structure of the graph allows its to know what 
calculation each of the other agents performed; not the actual numbers involved in the calculation, 
but rather "the formula" that was used. 

This model is similar to the one presented in [9]. The agents in that model, however, were not 
Bayesian and had no memory of their observations in past iterations. 

3 Results 

We prove two results: 

1. Efficient Computation: Each agent's calculation is computationally efficient: it can be achieved 
using simple linear algebra operations, involving matrices whose size is the size of the network. 

2. Efficient Learning: The agents' posterior beliefs all converge to the same value. This is 
the value that they would have converged to, had they all access to each other's private 
measurements. Furthermore, the process converges in at most 2n ■ d iterations, where d is the 
diameter of the graph. 

3.1 Computational Efficiency 

We begin with an informal discussion of the computational aspects of the learning process. The 
agents' private signals are distributed normally, with means S v and standard deviations one. When 
an agent observes its neighbor's action, it learns the mean of its neighbor's current belief. This is 
an estimator of S, which can inductively be shown to also be normally distributed. It can also be 
shown to be a weighted average, and hence a linear combination, of the estimators seen so far, and 
hence a linear combination of the different S v 's. Knowing the structure of the graph, an agent can 
know the coefficients of these linear combinations, coefficients that are independent of the actual 
values of the S^s. 

When an agent observes a neighbor's action, it adds to its memory an additional estimator 
of S, and in particular one that is a linear combination of the original S v 7 s. If this estimator is 
already in the space spanned by the estimators in the agent's memory, then the agent gains no new 
information. Otherwise, the agent increases the dimension of the space spanned by its memory. 

In this multivariate Gaussian case, an agent's belief, at each iteration, is the unique linear 
combination, over a basis of the estimators in its memory, that minimizes its belief's variance while 
keeping it unbiased. This calculation involves inverting an n by n matrix, where n is the number 
of linearly independent estimators observed so far (so at most equal to n, the number of agents). 
This can be done very efficiently as the more rigorous analysis below establishes. 

3.1.1 The agents' calculation 

In this subsection we take the alternative, but equivalent, point of view that the agents' prior is the 
improper uniform measure over the reals, and that their private signals are their first observation. 
It is easy to see that this is indeed equivalent to having the private signal as the prior. 
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Let W be the vector space spanned by the different S^s: 



W = < PvS v s.t. Vv : (3 V eR> . (1) 
Lev J 

It is easy to convince oneself that this indeed is a vector space of finite dimension. Note that all the 
random variables in this space are normally distributed, since a linear combination of independent 
normal random variables is in turn normal too. Denote by W 1 the subset of unbiased estimators 
in W: 

W 1 = { V f3 v S v G W s.t. Vft = l|. (2) 



;1 = | £ G w s -t- £ & = 1 1 

[vev vev J 



Theorem 3.1. For all agents w and times t, it holds that x w (t) G W 1 , urei/i = ^2 V Pwv(t)Sv 

for some f3 wv (t). Moreover, given the graph structure only, there exists an efficient algorithm to 
calculate the coefficients (3 wv (t). 

Proof. We shall prove this by induction on t. At time t = the claim is true since f3 wv (0) is one 
when w = v and zero otherwise. Assume that the claim is true until time t. 

Consider an agent w, and denote by ro, . . . , r& the random variables that agent w has observed 
up to time t, with ro = S w = x w (0). Those are w's own and its neighbors past estimators, and 
so knowledge of the graph structure is required here. By our assumption these are all in W 1 , and 
we can write = X^a^SV,, where the coefficients cti v are a simple re-indexing of the coefficients 
/3 W v(t), by some relation that maps each w and t to some i. Since by assumption ri G W 1 then 

Denote by r the vector (ro, . . . ,r n ), denote by 1 the vector (1, . . . , 1) G R n , and denote by CV,- 
the covariance between rj and rj, so that Cjj — ^2i V Oii v ctj v . Then r's distribution is the normal 
multivariate distribution with covariance matrix C and mean 15 (since ri G W 1 ), and the likelihood 
of S given that agent w has observed r is 

C(S\r) = p(r\S) = i e -i(r-iS)'C-i(r-i5)_ (3) 

(2vr) ri/2 |C| 1 /2 V 7 

Note that in the case that C is not invertible (equivalently, r is not linearly independent) we 
remove from it (and correspondingly from r) a minimal set of columns and rows such that it 
becomes invertible. By corollary, C is never larger than n x n. 
The expression (r — lS') / C~ 1 (r — IS*) can be rewritten as 

1'cr 1 ^ 2 



VC l l \ l'C- 1 l / 

with B a normalization factor. Denote by 7 the vector p^ry . Note that X^7« = 1- 
To calculate its posterior distribution, agent w uses Bayes' Law. Then 

p(v\S)-p(S) 



p(S\v) 



p(r) 
C(S\r), 



because the prior distribution p(-) is uniform. This can be written as 



p(S\r) = _L== e -(^-^) 2 /^ (4 ) 
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where 

x = vc -n r = ^ llTi () 

i 

and 

r 2 = ^ 

l'C-!l 

Note that x is a linear combination of the observations that w made up to time t. The expected 
value of this distribution (J3J) is x, and therefore x w {t + 1) = x. Then 

x w (t + l) = } j 7^ } y a iv S v (6) 

i v 

and therefore 

f3 wv (t + l) = ) j ) j Ha iv . (7) 

i v 

Since Yl% 1% = 1 an d oti V = 1 then /3 wv (t + 1) = 1. We have shown then that x w (t + l) G W 1 . 
We have also shown that to calculate (3 wv (t + 1), given the coefficients at time t, one need only 
invert n matrices (one for each agent), of size at most n x n - certainly an efficient calculation. 
Furthermore, no knowledge of the SVs is needed, but only of the graph structure. □ 

All the agents, if they know the graph structure, can perform this calculation efficiently. In 
particular an agent can calculate its coefficients vector 7, and from it calculate its next estimator 
for time t + 1, given that it has performed the calculation above up to time t. 

3.2 Learning Efficiency 

3.2.1 Convergence in n 2 

To show that the beliefs of the agents converge, we need only note that being conditional probabil- 
ities over increasingly large probability spaces, these beliefs are martingales. Then, because these 
martingales are bounded in L2, they converge. However, the following proof, which does not require 
the power of martingales, shows that convergence in fact takes places in at most n 2 iterations, and 
that furthermore all agents converge to the same belief. This proof is similar to the one presented 
by DeMarzo et al. 0- 

When two neighboring agents have different beliefs, then at least one of them will learn from the 
other and improve its estimator: Assume agents u and v are neighbors with different estimators, 
and agent u's belief has variance lower than or equal to that of agent u. Then agent v's estimator is 
necessarily not in the space spanned by the estimators previously seen by u. Hence the dimension 
spanned by n's memory will increase at this iteration. We have thus shown that in each iteration, 
unless all the agents have the same estimator, at least one of them increases the dimension of its 
space by at least one. Since the maximum dimension possible is n then convergence will occur after 
at most n 2 steps, and all agents will converge to the same belief. 

3.2.2 Convergence in 2n ■ d iterations 

A slightly more subtle argument proves a better bound for the convergence rate, namely 2n ■ d, 
where d is the diameter of the graph. The idea of the proof is that the current estimator of an 
agent u cannot remain unchanged for many steps, unless a growing neighborhood around u also 
remains stagnant. The formal proof uses the following lemma. 
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Lemma 3.2. If some agent's estimator has not changed for 2d steps then the process has converged. 

Proof. Assume agent it's estimator does not change from iteration to to to + 2c2, so that 

x u {t ) = x(t + 1) = • • • = x(t + 2d). 

Denote x := x u {to) = ■ • ■ = x(to + 2d), and let U be the space spanned by the estimators in it's 
memory at time to + 2d. Then by definition of the process x is the optimal unbiased estimator in 
U. 

Let wbea neighbor of u. Then iu's estimator at time to + 1, x w (to + 1), is in U, since u observes 
x w (to + 1) & t time to + 2. Now x by definition is better than any estimator in U, and so, since w 
has observed x at time to, it must be that x w (to + 1) = x. By the same argument x w (t) = x for 
to + 1 < t < t + 2d - 1. 

Applying this argument inductively, it follows that at time to + i < t < to + 2d — i all the agents 
at distance i from u have estimator x, and so at time to + d all agents have the same estimator. 

Recalling that at each iteration an agent's estimator is a weighted average of those of its neigh- 
bors, we conclude that all nodes will have estimator x for all times t > to + d. The proof follows. 

□ 

Theorem 3.3. The process stops after 2n ■ d iterations. 

Proof. Every time an agent's estimator changes, the dimension of the space the agent's memory 
spans increases by at least one, and so this cannot happen more than n times. Since This must 
happen every 2d steps as long as the process hasn't converged, the process must stop after 2n ■ d 
iterations. □ 

3.2.3 Convergence to the Optimum 

At any particular iteration, any node v contains S v in the space of its estimators. At each iteration 
the estimator at v is then of the form aS v + bS where S is an unbiased linear estimator based on 
some signals but S v , and a + b = 1. Note that the variance of this estimator is a 2 + b 2 Var(S) 
and it is minimized when a = Var(S)/(l + Var(S)). Since S depends on all the signals but S v its 
variance is at least l/(n — 1) and therefore a is at least 1/n. 

Hence all the agents, at all iterations, give their own estimators weight which is at least 1/n. 
Since they all converge to the same estimator, and since the sum of the weights in this estimator 
must be one (since it, too, is unbiased), then the weights must all be 1/n, and the limiting estimator 
is the simple average of the private measurements, as stated. 

4 Conclusion and Future Work 

In this work we presented a first learning model on social networks which is both rational and 
computationally efficient. Our work raises a number of future research direction which we briefly 
discuss. 

4.1 Network Structure 

A shortcoming of the model introduced here is the assumption that all agents have complete access 
to the network structure. Future work may relax this assumption in a number of ways. In particular: 
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4.1.1 Random Networks 

In the Bayesian literature it is common to model the unknown network structure by a common prior 
distribution on network structures. One may model this by picking the network randomly from 
some distribution, which is known to the agents. The agents then proceed similarly, calculating 
conditional expectations Bayesianly, and incorporating into their posterior distributions whatever 
information they may have gathered about the structure of the network. 

While there are many options for choosing such a distribution over networks, an interesting 
challenge is to see for which distributions the agents' calculations remain tractable. We leave this 
as an open question. We note only that the standard martingale arguments in the area [12] for 
convergence remains valid for any distribution of connected networks and that therefore, in any 
case, we have convergence in this scenario, too. 

4.1.2 Learning Covariance Structure 

A different approach is to study natural mechanisms by which neighbors learn the covariance 
between their neighbors' signals at different iterations. As we noted above, this knowledge is 
sufficient for calculating optimal estimators, even without any understanding of the graph structure. 
We have begun to explore such models, and their computational feasibility, together with Rafael 
M. Frongillo and Grant Schoenebeck. 

4.2 Convergence Rate 

The results established in this paper show convergence in 0(n ■ d). A natural question is whether 
this bound can be improved. Certainly, convergence cannot happen faster than 0{d) - the time 
it takes information to propagate through the network. For binary trees, where the diameter is 
O(logn), convergence does happen in 0(d), as it does for cliques and stars. However, simulations 
have led us to believe that convergence in general is not that fast, and requires - we conjecture - 
0(n) steps. 

In our simulations we sampled a population of regular graphs of degree three and diameter log n. 
The result almost always was convergence in time n/3, with every agent increasing the dimension 
of the space its memory spanned by three, at every iteration. This may hint that convergence 
time may, in some sense, be inversely proportional to the degrees of the graph vertices. We thus 
conclude with the following conjecture and open problem: 

Conjecture 4.1. For any graph the learning process converges in 0(n) iterations. 

Open Problem 4.2. Does the process converge in 0(n/d*) iterations for all graphs, where d* is 
the minimal degree of the graph ? 

5 Acknowledgments 

We thank Shachar Kariv for introducing us to the literature on learning on networks and for 
fascinating discussions. Thanks go to Marcus Mobius for an interesting discussion and for directing 
us to the work of DeMarzo et al., after a draft of this paper had been circulated. We also thank 
Grant Schoenebeck, Rafael M. Frongillo and Adam Kalai for interesting discussions regarding 
follow-up work. Finally, we are indebted to Yaron Singer for much support and helpful suggestions 
on writing the results. 



9 



References 

[1] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar. Bayesian learning in social networks. 
Preprint, 2008. 

[2] V. Bala and S. Goyal. Learning from neighbours. Review of Economic Studies, 65(3):595-621, 
July 1998. 

[3] A. V. Banerjee. A simple model of herd behavior. The Quarterly Journal of Economics, 
107(3):797-817, 1992. 

[4] D. H. Bikhchandani, S. and I. Welch. A theory of fads, fashion, custom, and cultural change 
as informational cascade. Journal of Political Economy, 100(5):992-1026, 1992. 

[5] J.-A.-N. Condorcet. Essai sur V application de I 'analyse a la probabilite des decisions rendues 
d la pluralite des voix. De l'lmprimerie Royale, 1785. 

[6] M. H. DeGroot. Reaching a consensus. Journal of the American Statistical Association, 
69(345):118-121, 1974. 

[7] P. DeMarzo, D. Vayanos, and J. Zwiebel. Persuasion bias, social influence, and unidimensional 
opinions. Quarterly Journal of Economics, 118:909-968, 2003. 

[8] D. Gale and S. Kariv. Bayesian learning in social networks. Games and Economic Behavior, 
45(2):329-346, November 2003. 

[9] E. Mossel and O. Tamuz. Iterative maximum likelihood on networks. To appear in the 
proceedings of Allerton, 2009. 

[10] E. Mossel and O. Tamuz. Efficient bayesian learning in social networks with gaussian estima- 
tors. Preprint at http://arxiv.org/abs/1002.0747, 2010. 

[11] H. Simon. Reason in Human Affairs. Stanford University Press, 1982. 

[12] L. Smith and P. Sorensen. Pathological outcomes of observational learning. Econometrica, 
68(2):371-398, 2000. 



10 



